Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(db): use paginated_update for viz migration #20761

Merged
merged 1 commit into from
Jul 19, 2022

Conversation

ktmud
Copy link
Member

@ktmud ktmud commented Jul 19, 2022

SUMMARY

DB migration introduced by #20359 did not run through in Airbnb environment and throws this error for our MySQL database:

INFO  [alembic.runtime.migration] Running upgrade c747c78868b6 -> 06e1e70058c7, Migrating legacy Area
Upgrading (1/21668): another#127
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.9/dist-packages/MySQLdb/cursors.py", line 183, in execute
    while self.nextset():
  File "/usr/local/lib/python3.9/dist-packages/MySQLdb/cursors.py", line 137, in nextset
    nr = db.next_result()
MySQLdb._exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")

This is because updating objects within iter_per does not work with MySQL cursors. I guess the MySQL client just doesn't like writing while reading. (Maybe other databases will have similar issues if anyone with more than 1,000 area charts can test?)

We've seen this error before, which was why paginated_update was introduced. Instead of streaming results with cursors, paginated_update which runs manual pagination with OFFSET and LIMIT, which makes sure read and write operations happen independently.

While optimizing this, I also did some other refactoring for the migration_viz class. Most notably, I relocated the files (and the corresponding tests) from superset root to superset.migrations as migration code should be as self-contained and as stable as possible. Anything under the root directory is considered app code, and app code can be updated much more frequently.

While testing, I also noticed that some charts will fail at reloading query_context as the combined JSON payload is too large for a default Text column in MySQL (which has max size of 64kb). The upgrade was not able to save the full serialized JSON string---then downgrade would fail. We need to migrate both Slice.query_context and Slice.params to MediumText, which I will address in another PR.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

CI and tested locally with Airbnb db instances.

The area chart migration was finished in ~20 seconds for about

ADDITIONAL INFORMATION

  • Has associated issue: feat: Area viz migration #20359 feat: TreeMap migration #20346
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@ktmud ktmud requested a review from a team as a code owner July 19, 2022 04:04
@ktmud ktmud changed the title fix(db): use paginated_update for area chart migration fix(db): use paginated_update for viz migration Jul 19, 2022
@ktmud ktmud force-pushed the use-paginated-update branch 2 times, most recently from 0f8233c to 7484f25 Compare July 19, 2022 04:18
@@ -45,11 +63,11 @@ def _migrate(self) -> None:

rv_data = {}
for key, value in self.data.items():
if key in self.mapping_keys and self.mapping_keys[key] in rv_data:
if key in self.rename_keys and self.rename_keys[key] in rv_data:
raise ValueError("Duplicate key in target viz")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should raise an error here. Maybe just silently override?

source_viz_type: str
target_viz_type: str

def __init__(self, form_data: str) -> None:
self.data = json.loads(form_data)
self.data = try_load_json(form_data)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data may be corrupted, let's always try/catch just to be safe.

class MigrateVizEnum(str, Enum):
# the Enum member name is viz_type in database
treemap = "treemap"
area = "area"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not need for such enum since we will import different stuff for different migrations anyway.

@codecov
Copy link

codecov bot commented Jul 19, 2022

Codecov Report

Merging #20761 (0e0f124) into master (e60083b) will decrease coverage by 11.73%.
The diff coverage is 49.31%.

❗ Current head 0e0f124 differs from pull request most recent head b67a49f. Consider uploading reports for the commit b67a49f to get more accurate results

@@             Coverage Diff             @@
##           master   #20761       +/-   ##
===========================================
- Coverage   66.35%   54.62%   -11.74%     
===========================================
  Files        1754     1756        +2     
  Lines       66689    66721       +32     
  Branches     7049     7049               
===========================================
- Hits        44253    36446     -7807     
- Misses      20639    28478     +7839     
  Partials     1797     1797               
Flag Coverage Δ
hive ?
mysql ?
postgres ?
presto 53.14% <49.31%> (+0.05%) ⬆️
python 57.48% <49.31%> (-24.21%) ⬇️
sqlite ?
unit 50.28% <4.10%> (-0.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/migrations/shared/utils.py 32.78% <37.50%> (-0.55%) ⬇️
...perset/migrations/shared/migrate_viz/processors.py 45.83% <45.83%> (ø)
superset/migrations/shared/migrate_viz/base.py 40.96% <52.50%> (ø)
superset/migrations/shared/migrate_viz/__init__.py 100.00% <100.00%> (ø)
superset/utils/dashboard_import_export.py 0.00% <0.00%> (-100.00%) ⬇️
superset/key_value/commands/update.py 0.00% <0.00%> (-88.89%) ⬇️
superset/key_value/commands/delete.py 0.00% <0.00%> (-85.30%) ⬇️
superset/db_engines/hive.py 0.00% <0.00%> (-85.19%) ⬇️
superset/key_value/commands/delete_expired.py 0.00% <0.00%> (-80.77%) ⬇️
superset/dashboards/commands/importers/v0.py 15.62% <0.00%> (-76.25%) ⬇️
... and 280 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e60083b...b67a49f. Read the comment docs.

Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving migration in the production environment. LGTM

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix and detailed context @ktmud!

@ktmud ktmud merged commit e2d3ea8 into apache:master Jul 19, 2022
@john-bodley john-bodley deleted the use-paginated-update branch February 17, 2023 22:16
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants